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ABSTRACT 


The SSU rRNA gene is one of the most widely utilized loci for phylogenetic inference 
among eukaryotic organisms. Although they have an average length of 1800 to 1900 bp, 
several unusually large 18S rDNA sequences have been reported. After examining GenBank 
sequences and 180 new 18S rRNA sequences from several metazoan groups, we report many 
other extraordinary sequences ranging between ca. 1350 bp (in symphylan myriapods) to ca. 
3300 bp (in some strepsipteran insects). Myriapods are particularly interesting, having inde- 
pendently evolved extraordinary sequences in the four classes (Chilopoda, Diplopoda, Sym- 
phyla, and Pauropoda). An insertion event of ca. 300 bp has been detected in all but the most 
basal family of geophilomorphan centipedes. Other major insertions are also found in other 
arthropod groups, in onychophorans, molluscs, chaetognaths, echinoderms, and parasitic platy- 
helminths. The use of information derived from secondary structure predictions combined with 
a new method to analyze DNA sequence data without multiple sequence alignments is pro- 
posed as a solution for analyzing sequence data that possess alternatively conservative and 
variable regions, such as ribosomal genes. 


INTRODUCTION logenetic relationships among metazoans. Ri- 

The small-subunit ribosomal RNA gene is | Dosomal genes play a fundamental role in the 
one of the most widely utilized loci in phy- Synthesis of proteins in eukaryotic and pro- 
logenetic inference among eukaryotic organ- _—_ Karyotic cells. The SSU rRNA locus (in par- 
isms (e.g., van de Peer and De Wachter, ticular the 18S rRNA gene) has been widely 
1997) especially for the examination of phy- —_ used to infer phylogenetic relationships for 
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reasons that have been elaborated elsewhere 
(e.g., Sogin, 1991; Adoutte and Philippe, 
1993). The main rationale is (1) that this 
gene is long enough to provide phylogenetic 
information, (2) it is among the slowest 
evolving sequences found in all living organ- 
isms, and (3) different regions of the mole- 
cule evolve at different rates. For these rea- 
sons the 18S rRNA gene allows the inference 
of phylogenetic history across a broad taxo- 
nomic range. Moreover, the presence of 
many copies per genome and their homoge- 
nization through concerted evolution (Dover, 
1982; Hillis and Dixon, 1991) greatly reduce 
intraspecific variation and facilitate DNA 
amplification via PCR. 

The SSU rRNA gene of most organisms 
is between 1800 and 1900 bp in total length. 
However, exceptional cases of long SSU 
rRNA genes are not uncommon (see Crease 
and Colbourne, 1998 for some examples), 
with certain regions being particularly prone 
to sequence variability (both in nucleotide 
substitutions and in sequence length). 

In some cases, the SSU rRNA gene con- 
tains type I introns that are spliced out of the 
mature rRNA molecule (e.g., De Wachter et 
al., 1992; Wilcox et al., 1992; Bhattacharya 
et al., 1994). However, there are many long 
SSU rRNA genes that do not contain introns 
(e.g., Hinkle et al., 1994). Some of these long 
SSU rRNA genes have been reported for 
metazoans: Acyrthosiphon pisum 2469 bp 
(Kwon et al., 1991); Xenos vesparum 3316 
bp (Chalwatzis et al., 1995); Daphnia pulex 
2293 bp (Crease and Colbourne, 1998); Eu- 
peripatoides leuckarti 2206 bp (Aguinaldo et 
al., 1997); Echinococcus granulosus 2394 bp 
(Picén et al., 1996). To our knowledge, type 
I introns have not been found in any meta- 
zoan taxon. 

Other unusual phenomena have been re- 
ported for some organisms. The presence of 
more than one type of SSU rRNA gene has 
been found in the protozoan Plasmodium 
(Gunderson et al., 1987; Waters et al., 1989; 
Qari et al., 1994). Two types of 18S rRNA 
genes have also been reported in a group of 
metazoans, the freshwater and terrestrial pla- 
narians of the family Dugesiidae (Carranza 
et al., 1996, 1998a, 1998b). 

Because alternate conserved and noncon- 
served regions are present in the SSU rRNA, 
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sequence alignments are challenging. Some 
authors have based alignments on informa- 
tion derived from secondary structure models 
(see Kjer, 1995). However, different models 
(i.e., Gutell et al., 1985; Hendriks et al., 
1988a, 1988b; Neefs and De Wachter, 1990; 
Van de Peer et al., 1998) can lead to different 
phylogenetic results (Winnepenninckx and 
Backeljau, 1996). Moreover, the variable re- 
gions cannot be aligned reliably and thus 
used as phylogenetic information, even if 
secondary structure predicts that a certain 
string of nucleotides is homologous (e.g., a 
variable loop, situated between a conserved 
stem). 

The existence of different models can be 
attributed to problems in inferring secondary 
(and tertiary and quaternary) structures via 
direct observation. This was pointed out by 
Gutell and collaborators, who acknowledged 
that “‘any rigorous search for a secondary 
structure model for 16-S rRNA would ne- 
cessitate use of the comparative method”’ 
(Gutell et al., 1985: 156). These authors also 
mentioned that the direct approach of X-ray 
crystallography remains only a remote pos- 
sibility, because of the difficult procedure for 
preparing high-quality crystals of ribosomes 
or their subunits, together with the added 
problem that tertiary structure may be di- 
rected and/or stabilized by quaternary inter- 
actions (Gutell et al., 1985). 

Where direct approaches have failed to in- 
fer the secondary structure of the SSU rRNA 
gene, some indirect approaches have suc- 
ceeded. The comparative method has been 
successful to the point of building a database 
of hundreds of secondary structures of SSU 
rRNA. This database, the SSU ribosomal 
subunit RNA database (http://rrna.uia.ac.be/ 
rra/ssu [Van de Peer et al., 1998]) is main- 
tained and updated constantly, and incorpo- 
rates probably the largest comparative mo- 
lecular data set. 

In the present study, we report some ex- 
traordinary examples showing enormous var- 
iation in length among 18S rRNA sequences 
of different metazoan taxa. We also comment 
on the regions of the molecule that concen- 
trate the largest variation. Finally we explore 
a novel method that facilitates the use of var- 
iable regions of the molecule, which cannot 
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be accommodated into standard phylogenetic 
analyses using sequence alignments. 


MATERIALS AND METHODS 


Genomic DNA samples were obtained 
from fresh, frozen, or ethanol-preserved tis- 
sues in a solution of Guanidinium thiocya- 
nate homogenization buffer following a 
modified protocol for RNA extraction 
(Chirgwin et al., 1979). The 18S rDNA locus 
was PCR-amplified in two or three overlap- 
ping fragments of about 950, 900, and 850 
bp each, using primer pairs 1F—5R, 3F-— 
18Sbi, and 5F—9R, respectively. Primers used 
in amplification and sequencing were de- 
scribed in Giribet et al. (1996, 1999). Am- 
plification was carried out in a 50 wL volume 
reaction, with 1.25 units of AmpliTaq DNA 
Polymerase (Perkin Elmer), 200 wM of d- 
NTPs, and 1 w~M of each primer. The PCR 
program consisted of a initial denaturing step 
at 94°C for 60 seconds, 35 amplification cy- 
cles (94°C for 15 sec, 49°C for 15 sec, 72°C 
for 15 sec), and a final step at 72°C for 6 
minutes in a GeneAmp PCR System 9700 
(Perkin Elmer). 

PCR samples were purified with the GE- 
NECLEAN III kit (BIO 101 Inc.) and di- 
rectly sequenced using an automated ABI 
Prism 377 DNA sequencer. Cycle-sequenc- 
ing with AmpliTaq DNA Polymerase, FS 
(Perkin-Elmer) using dye-labeled terminators 
(ABI PRISM BigDye Terminator Cycle Se- 
quencing Ready Reaction Kit) was _per- 
formed in a GeneAmp PCR System 9700 
(Perkin Elmer). Amplification was carried 
out in a 10 wL volume reaction: 4 wL of 
Terminator Ready Reaction Mix, 10—30 ng/ 
mL of PCR product, 5 pmoles of primer and 
dH,0 to 10 pL. The cycle-sequencing pro- 
gram consisted of a step at 94°C for 3 min- 
utes, 25 sequencing cycles (94°C for 10 sec, 
50°C for 5 sec, 60°C for 4 min) and a rapid 
thermal ramp to 4°C and hold. The BigDye- 
labeled PCR products were isopropanol-pre- 
cipitated following manufacturer protocol. 

Some PCR products that could not be se- 
quenced directly were purified and ligated 
into pUC 18 Sma I/BAP dephosphorylated 
vector using the SureClone Ligation Kit 
(Pharmacia P-L Biochemicals) as described 
in Giribet et al. (1996). Sequencing was then 
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performed by the dideoxy termination meth- 
od (Sanger et al., 1977) using T7 DNA poly- 
merase (Sequencing Kit from Pharmacia 
Biotech). 

All sequences have been deposited in 
GenBank (see taxonomy, sequence length, 
and accession codes in table 1). The new se- 
quences have been compared to other pub- 
lished sequences available from GenBank. 
The terminology used for the secondary 
structure topology follows the nomenclature 
of Van de Peer et al. (1998). 


RESULTS AND DISCUSSION 


Extraordinary 18S rRNA Sequences of 
Metazoans 


After studying the 18S rRNA gene of 
about 400 metazoan taxa (180 of these se- 
quences collected by the authors; table 1), we 
have observed that several animal taxa pos- 
sess large insertions at different regions of 
the molecule. This variation is shown in fig- 
ure 1 and table 1. For example, within the 
phylum Mollusca, the cephalopods Loligo 
pealei (squid), Sepia elegans (cuttlefish), and 
Nautilus scrobiculatus present large inser- 
tions in regions V2, V4, V7, and V9. Other 
groups of molluscs such as the anomalodes- 
matan bivalves present large insertions in re- 
gion V7, and some Archaeogastropoda have 
insertions in regions V2 and V4. But inser- 
tions are not restricted to the phylum Mol- 
lusca. Sea cucumbers (Echinodermata, Hol- 
othuroidea) and arrow-worms (Chaetogna- 
tha) present insertions in region V4; some 
leeches (Annelida, Hirudinea) present inser- 
tions in region V7; some parasitic planarians 
(Platyhelminthes) present insertions both in 
regions V4 and V7; and velvet worms (On- 
ychophora) present insertions in regions V2, 
11, E23-7, V7, and V9. Within the arthro- 
pods, there are many extraordinary cases 
among insects (see the reported strepsipteran 
sequences by Chalwatzis et al., 1995; 1996; 
Whiting et al., 1997) and certain crustacean 
groups. But the most bizarre case within ar- 
thropods (and perhaps for the entire Meta- 
zoa) are myriapods. 


The 18S rRNA Gene of the Myriapods 


Myriapods comprise four groups of terres- 
trial arthropods. Centipedes (class Chilopo- 
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TABLE 1 
List of the 180 Species of 18S rRNA Sequences Generated by the Authors 
(with GenBank accession codes and sequence length [excluding a total of 46 bp from the external 
primers 1F and 9R]; asterisks refer to noncomplete sequences, and thus the length is not reported) 


























GenBank bp GenBank bp 

Annelida (Polychaeta) (3 sp.) Lima lima AF120533 177] 
Eunice torquata AF123304 1768 Limaria hians AF120534 1767 
Dinophilus gyrociliatus AF119074 1784 Anomia ephippium AF120535 1763 
Myzostoma sp. AF123305 1770 Psilunnio littoralis AF120536 1766 
Phylum Sipuncula (3 sp.) Lampsilis cardium AF120537 1765 
Aspidosiphon misakiensis AF119090 1766 Neotrigonia bednallt AF120538 1765 
Themiste alutacea AF119075 1757 Pandora sp. AF 120539 2143 
Phascolopsis gouldii AF123306 1765 fp esienyaiid AF120540 1986 
Phylum Echiura (2 sp.) tee eg auie AF12054]-2* 
Bonellia viridis AF123307 1787 ardiomya costellata AF120543 1804 
Urechis sp AF119076 1772 Myonera sp. AF120544 1884 

; Chama gryphoides AF120545* 
Phylum Nemertea (2 sp.) Codakia cfr. orbiculata AF120546* 
Prostoma etthardi U29494 1790 Galeomma turtoni AF120547 1775 
Amphiporus sp. AF119077 1778 | Lasaea sp. AF120548 1774 
Phylum Mollusca (61 sp.) Cardita calyculata AF120549 1727 
Polyplacophora (2 sp.) Cardites antiquata AF120550 1775 
Lepidopleurus cajetanus AF120502 1761 Astarte castanea AF120551 1775 
Acanthochitona sp. AF120503 1763 Dreissena polymorpha AF120552 1782 
Cephalopoda (3 sp.) Parvicardium exiguum AF120553* 
Nautilus scrobiculatus AF120504 2485 Abra sp. AF120554 1770 
Loligo pealei AF120505 2221 Ensis ensis AFI 20555 1765 
Sepia elegans AF120506-7* Calyptogena magnifica AF 120556 1777 
Gastropoda (14 sp.) Corbicula fluminea AF120557 1777 
Cocculina messingi AF120508 1775 Sphaerium striatinum AF120558 1781 
Entemotrochus adansonianus AF120509 1991 Mercenaria mercenaria AF120559 1779 
Perotrochus midas AF120510 1986 Mya arenaria AF120560 1783 
Haliotis tuberculata AF120511 1809 Varicorbula dissimilis AF120561 1795 
Sinezona confusa AF120512 1810 Gastrochaena dubia AF120562 1777 
Diodora graeca AF120513 1855 Hiatella arctica AF120563 1774 
Clanculus cruciatus AF120514 1733 Bankia carinata AF 120564 179] 
Theodoxus fluviatilis AF120515 1767 Phylum Brachiopoda (1 sp.) 
Viviparus georgianus AF120516 1795 Argyrotheca cordata AF119078 1762 
Truncatella guerinii AF120517 1834 Phylum Phoronida (2 sp.) 
TPUR GI ESD: 2 ARIAESAe Wee) Phoronis australis AF119079 1767 
Bello ceulied Shoot! rine Phoronopsis viridis AF 123308 1765 
Rissoella caribea AF120520 2239 
Discodoris atromaculata AF120521 1858 Phylum Bryozoa (3 sp.) 
Scaphopoda (2 sp.) Lichenopora sp. AF119080 1785 
Dentalium pilsbryi AF120522 1804 Membranipora sp. AFI119081 1761 
Rhabdus rectius AF120523 1810 Caberea boryi AF119082 1772 
Bivalvia (40 sp.) Nemertodermatida (1 sp.) 
Solemya velum AF120524 1771 Meara stichopi AF119085 1768 
Nucula sulcata AF120525 1765 Phylum Priapula (1 sp.) 
Nucula proxima AF120526 1766 Tubiluchus corallicola AF119086 1768 
Actla castrensis AF120527 1765 Phylum Onycophora (2 sp.) 
Yoldia limatula AF120528 1767 Peripatopsis capensis AF119087 2174 
Nuculana minuta AF120529 1770 Epiperipatus biolleyi AEXXXXXX* 
Lithophaga lithophaga AF120530 1767 
Striarca lactea AF120531 1765 Phylum Tardigrada (1 sp.) 
Preria hirundo AF120532 1775 Macrobiotus hufelandt X81442 1762 
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Phylum Arthropoda 
Chelicerata (49 sp.) 
Achelia echinata 
Callipallene sp. 

Endeis laevis 
Colossendeis sp. 

Limulus polyphemus 
Carcinoscorplus rotundicaudatus 
Belisarius xambeui 
Pseudocellus pearsei 
Ricinoididae sp. 

Gluvia dorsalis 
Eusimonia wunderlichi 
Chanbria regalis 
Stenochrus portoricensis 
Trithyreus pentapeltis 
Mastigoproctus giganteus 
Paraphrynus sp. 
Amblypygidae sp. 
Liphistius bicoloripes 
Nesticus celullanus 
Roncus cfr. pugnax 
Americhernes sp. 
Opilioacarus texanus 
Siro rubens 

Parasiro coiffaiti 
Stylocellus n.sp. 
Dalquestia formosa 
Odiellus troguloides 
Opilio parietinus 
Astrobunus grallator 
Nelima sylvatica 
Leiobunum sp. 
Hadrobunus cft. maculosus 
Caddo agilis 
Ischyropsalis luteipes 
Hesperonemastoma modestum 
Sabacen cavicolens 
Dicranolasma soerenseni 
Centetostoma dubium 
Nemastoma bimaculatum 
Equitius doriae 
Triaenobunus sp. 

Zuma acuta 

Oncopus cfr. alticeps 
Scotolemon lespesi 
Matorerus randoi 
Bishopella laciniosa 
Gnidia hoinbergii 
Pachyloides thorellii 
Hoplobunus sp. 





TABLE 1—(Continued) 
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GenBank bp 

AF005438 1795 
AF005439 1787 
AFO005441 179] 
AF005440 1793 
U91490 1759 
U91491 1759 
AFO005442 1761 
U91489 1763 
AF124930 1757 
AF007103 1761 
U29492 1762 
AF124931 1763 
AF005444 1759 
AF124932 1761 
AF005446 1760 
AF005445 1760 
AF124933 1759 
AFO007104 1762 
AF005447 1763 
AF005443 1761 
AF124934 1762 
AF124935 1763 
U36998 1763 
U36999 1761 
U91485 1763 
AF 124936 1761 
X81441 1759 
AF 124938 1761 
AF124939 1761 
U91486 1762 
AF124940 1761 
AF12494] 1761 
U91487 1766 
U37000 1758 
AF124942 1762 
AF124944 1762 
U37001 1756 
U37002 1758 
AF124947 1758 
U37003 1762 
AF1I24950 1763 
AF124951 1762 
U9 1488 1762 
U37005 {760 
U37004 1763 
AF124952 1763 
U37006 1760 
U37007 1761 
AF124953 1762 

















GenBank bp 

Hexapoda (11 sp.) 
Podura aquatica AFO005452 1761 
Acerentulus traeghardi AF005453 1955 
Campodea tillyardi AF173234 1851 
Campodeidae sp. AF005455* 
Catajapyx sp. AFO005456* 
Dilta littoralis AF005457 1792 
Machiloides sp. AFXXXXXX 1790 
Lepisma sp. AF005458 1785 
Thermobius sp. AFXXXXXX = 1788 
Tricholepidion gertschi AFXXXXXX 1809 
Myriapoda (38 sp.) 
Cylindroiulus punctatus AF005448 1785 
Polydesmus coriaceus AF005449 1783 
Scutigerella sp1 AF007106 1299 
Scutigerella sp.2 AF005450* 
Hanseniella sp. AF173237* 
Pauropodidae sp. AF005451 2182 
Scutigera coleoptrata AF000772 1819 
Thereuopoda clunifera AF119088 1817 
Allothereua maculata AF173240* 
Lithobius variegatus AF000773 1814 
Australobius scabrior AF173241 1815 
Paralamyctes n.sp. AF173242 1818 
Lamyctes emarginatus AF173244 2099 
Henicops maculatus AFI73245 2231 
Anopsobius n. sp. AF173247 1944 
Craterostigmus tasmanianus AFOOQ0774 1814 
Scolopendra cingulata U29493 1841 
Cormocephalus monteithi |= AF173249 1842 
Ethmostigmus rubripes AF173250 1844 
Alipes sp. AF173251 1844 
Rhysida nuda AF173252 1846 
Cryptops trisulcatus AF000775 1819 
Theatops erythrocephala AF000776 1818 
Scolopocryptops nigridus AF173253 1817 
Mecistocephalus sp. AF173254 1820 
Pseudohimantarium 

mediterraneum AF000778 2157 
Henia (Chaetechelyne} 

vesuviana AFI73255 2194 
Pectiniunguis argentinensis AF173256 2006 
Schendylops pampeanus AF173257 2053 
Ballophilus australiae AF173258 2108 
Clinopodes cfr. poseidonis § AFOOO777 2224 
Tasmanophilus sp. AF173259 1930 
Tuoba sydneyensis AF173260 2083 
Zelanion antipodus AF173261 2194 
Zelanion sp. AF173262 2224 
Ribautia n. sp. AF173263 2218 
Aphilodon weberi AF173264 2015 
Strigamia maritima AF173265 2122 
Phylum Enteropneusta (1 sp.) 
Glossobalanus minutus AF119089 1776 
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Schematic representation of the 18S rRNA locus. The gray squares represent the variable 


regions V2, V4, V7, and V9 with insertions (V2: Onychophora, Geophilomorpha, Cephalopoda, Ar- 
chaeogastropoda; V4: Hexapoda, Crustacea, Pauropoda, Holothuroidea, Chaetognatha, Platyhelminthes, 
Cephalopoda; V7: Onychophora, Hexapoda, Crustacea, Pauropoda, Chilopoda, Platyhelminthes, Hiru- 
dinea, Cephalopoda, Gastropoda; V9: Onychophora, Crustacea, Cephalopoda). The black arrowheads 
represent particular insertions (10: Pauropoda; 11: Onychophora; E23—7: Onychophora and Pauropoda; 
E23-—8: Pauropoda; 29: Pauropoda; 46: Protura). The black bar represents the 500 bp deletion of the 


Symphyla. 


da) and millipedes (class Diplopoda) are the 
two principal classes of myriapods. The two 
other classes of myriapods lack common 
names: Symphyla and Pauropoda. Prior to 
the analysis of Edgecombe et al. (1999), the 
only complete 18S rRNA sequence data for 
myriapods available at GenBank were eight 
centipedes and two millipedes (Giribet et al., 
1996, 1999; Giribet and Ribera, 1998). Two 
centipede species of the order Geophilomor- 
pha (Clinopodes poseidonis and Pseudohi- 
mantarium mediterraneum) present an inser- 
tion of about 300 bp at region V7, whereas 
all the other available sequences are fairly 
conserved in terms of primary sequence. 
However, a wider ongoing study on the 18S 
rRNA gene of myriapods suggests that they 
constitute one of the most interesting cases 
of 18S rRNA variation in any metazoan 
group. 

Within the centipedes, the members of the 
order Geophilomorpha (15 species studied 
belonging to 9 families), excluding two spe- 
cies of the most basal family Mecistocephal- 
idae (Edgecombe et al., 1999), exhibit inser- 
tions of about 300 bp in the region V7. This 
is an unusual example that shows exactly 
when the insertion occurred during the phy- 
logenetic process, and illustrates the putative 
information of such insertions (fig. 2). 

Within the millipedes, members of the 
family Polyzonidae display sequences longer 


than 2700 bp. Data for one species of Pau- 
ropoda show that the 18S rRNA is ca. 2200 
bp, with several small insertions (Giribet, 
1997; Giribet and Ribera, 2000). But perhaps 
the most unusual case among metazoan 18S 
rRNA sequences is the Class Symphyla. Am- 
plification of the 18S rRNA loci of three spe- 
cies belonging to two genera (two species of 
Scutigerella from northeastern Spain and the 
Canary Islands, respectively, and one species 
of Hanseniella from Australia) yielded a 
product band size of about 1350 bp. Se- 
quencing this fragment suggests a deletion of 
about 500 bp in the central region of the mol- 
ecule. 

Although it might be conjectured that in 
this case a nonfunctional pseudogene has 
been sequenced, as occurred with the 18S 
rRNA locus of the platyhelminth Dugesia 
mediterranea (Carranza et al., 1996) and oth- 
er dugesiids (Carranza et al., 1998a, 1998b), 
this seems improbable for several reasons. 
First, this sequence has been obtained from 
three different species and in two indepen- 
dent laboratories. Second, none of the highly 
conserved primers from the “‘deleted’”’ region 
(forward primers 4F 18Sa0.7, 18Sa0.79, 
18Sal.0; reverse primers 4R, 18Sb5.0, 
18Sb3.9, 18Sb3.0 [Giribet et al., 1996; Whit- 
ing et al., 1997]) amplified any DNA frag- 
ment when combined with primers from oth- 
er regions. If the 1350 bp fragment was a 
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Scutigeromorpha 
Lithobiomorpha 
Craterostigmomorpha 
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Fig. 2. Phylogenetic tree of the centipedes based on the combined analysis of Edgecombe et al. 
(1999). The arrow indicates where the insertion of ca. 300 bp at region V7 occurred during the evolution 


of centipedes. 


pseudogene, we would expect to amplify 
fragments of the original gene when using 
the conserved primers located within the 
*“‘deleted”’ region. Third, phylogenetic anal- 
yses including the symphylan sequences 
show that symphylans are arthropods related 
to other myriapods (Giribet, 1997; fig. 3). 
Fourth, amplification of DNA from an RNA 
source, as described by Carranza et al. 
(1998a), yielded a product band size of ap- 
proximately 1350 bp, as expected. These 
facts demonstrate that a deletion of ca. 500 
bp occurred in the common ancestor of these 
three symphylan species. 


18S rRNA Variation in Metazoans 


It seems that large insertions and deletions 
are not as constrained as was previously 
thought (e.g., Crease and Colbourne, 1998). 
These events occur in many metazoan taxa, 
and are commonly in regions V2, V4, V7, 
and V9. Other parts of region 23 (that in- 
cludes the region V4) are also variable. In a 
phylogenetic study of about 150 arthropod 
18S rRNA sequences (Giribet and Ribera, 
2000), insertions at region E23—7 were ob- 
served in Onychophora and in Pauropoda 
while insertions at region E23—8 were ob- 
served in the pauropod species. Other inser- 


tions observed in particular taxa occur at 
sites 8 (in the millipede Polyzonium), 11 Gn 
Onychophora), 29 (in Pauropoda), and 46 (in 
the proturan Acerentulus traeghardi). How- 
ever, we only obtained sequences from one 
pauropod and one proturan, hence these re- 
sults cannot be generalized to other members 
of such groups. 

Certain taxa present insertions in variable 
regions whereas in the remaining regions the 
primary sequence may be conserved. For ex- 
ample, certain geophilomorph centipedes ex- 
hibit a small insertion at region V2 (between 
10 and 80 bp compared to other centipedes) 
and a large insertion (about 300 bp) at region 
V7, whereas the remaining positions are con- 
served with respect to other centipedes. Oth- 
er taxa not only present insertions in the var- 
iable regions, but also in the primary se- 
quence. This is the case in the cephalopods, 
which have insertions in regions V2, V4, V7, 
and V9, and differ considerably from other 
molluscs in the primary sequence of the re- 
maining regions. 

Although several metazoans present inser- 
tions in the 18S rRNA, reduction of the 18S 
rRNA gene appears to be a rare event in evo- 
lution. To our knowledge, there are no other 
reported cases of 18S rDNA sequences with 
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Archagopsylla 
Panorpa 
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Hycropsyche 
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Sculigerella-2 ce 
auropo lalls 
claglena Myrmeleon 
Polizonium Polistes 
acion Leptothorax 
vl uous Pasta 
ea yllum emitaxonus 
Urihoporus Ophion 
Criotocorypha Mantis 
Glomeris Blaberus 
Polydesmus Oligotoma 
Polyxenus Drosophila 
Eu faphis Laphria 
Pseudohimantarium Mythicomyia 
Clinopodes Tipula 
Lithobius Taeniothrips 
Sculigera Forficula 
Therevopoda Labidura 
ryptoos Mesoperlina 
scolopendra Cultus 
Theatops Acheta 
Craterostiqmus Melanoplus 
Gnidia Anisomorpha 
Oncopus enane 
Equitius Soissistilus 
Centetostoma Philaenus 
Cadce Saldula 
Parasiro ce 
Stylocellus Calopteryx 
Belisarius Aeshna 
Androctonus Libellula 
Eusimonia Ephemerella 
Gluvia Eohemera 
Pseudocellus Lepisma 
Paraphrynus Lepisma-s 
Carcinoscorpius Pocura 
Limulus Pseudachorutes 
Stenochrus Crossodontina 
Mastigoproctus Hypogastrura 
Euryoelma Lepidocyrtus 
Nesticus Campodeicae-2 
Liphistius Campodeicae-1 
Boophilus Japygicae 
Ixodes Acerentulus 
Megisthanus Dilta 
Cosmolaelops Petrobius 
Roncus Trigontophthalmus 
Achelia Artarnia 
Endeis Branchinectes 
Calipallene rgulus 
Colossendeis Stenocyoris 
Plectus Porocephalus 
Zeldia Cancer 
Aduncospiculum Pugetiia 
Macrobiotus-a Clibanarius 
Macrobiotus-h Oedionathus 
Hypsibius Baris 
Pycnopnyes Procambarus 
Gordius Berndtia 
Priapulus-1 Trypetesa 
Priapulus-2 Ulophysema 


Fig. 3. Phylogenetic tree based on 18S rRNA sequence data indicating the position of two symphy- 
lans (box) with respect to other myriapods (underlined taxa) in a phylogenetic analysis of arthropods 
(from Giribet, 1997). The two symphylans appear related to other myriapods. 
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a deletion of the magnitude of that observed 
in the symphylans (ca. 500 bp). The deletion 
corresponds to the central region of the mol- 
ecule (approximately from region 14 to E23— 
9). The remaining primary sequence is fairly 
conserved, which we assume may be func- 
tional. However reconstruction of a global 
secondary structure cannot be conducted us- 
ing the comparative method. Another case of 
sequence length reduction in the 18S rRNA 
locus occurs in the Dicyemid mesozoans 
(three species of the genus Dicyema), with a 
total gene sequence of about 1670 bp that 
presents two major deletions at the variable 
sites V2 and V9. 

The geophilomorphan centipedes that pre- 
sent the insertion of about 300 bp at region 
V7 (13 species) also display a large insertion 
(about 300 bp) at the D3 expansion fragment 
of the large subunit rRNA locus (28S rRNA). 
Neither of the insertions at region V7 of the 
18S rRNA or at the D3 expansion fragment 
of the 28S rRNA locus have been found in 
the putative most basal geophilomorph fam- 
ily Mecistocephalidae (Giribet et al., 1999; 
Edgecombe et al., 1999). This apparent cor- 
relation between the insertions at region V7 
of the 18S rRNA locus and at the D3 expan- 
sion fragment of the 28S rRNA locus (also 
observed in the cephalopod Loligo) could 
suggest a possible interaction between these 
subunits in the ribosome. 


18S rRNA Variable Sites and Phylogenetic 
Analyses 


The presence of certain variable sites in 
the 18S rRNA molecule can hinder phylo- 
genetic analyses at the alignment step, a fact 
that is promoted by several researchers who 
avoid the use of ribosomal genes for drawing 
inferences (e.g., Ayala et al., 1998). In gen- 
eral, researchers using ribosomal genes ex- 
clude the variable regions from their phylo- 
genetic inference step because of uncertain- 
ties in alignments (e.g., Giribet et al., 1996, 
1999). But since data removal from phylo- 
genetic analyses is also problematic (see Ga- 
tesy et al., 1993), and automatic alignments 
are explicit, other researchers prefer the use 
of automatic alignments exploring different 
cost matrices (e.g., Wheeler, 1995). Regard- 
less of which of these options is the best 
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(philosophically, computationally, or practi- 
cally), the extreme length variation of some 
of the SSU rRNA helices may constitute a 
serious problem in the phylogenetic analysis 
of certain data sets. 

A clear example is illustrated in figure 4, 
where we show the variable region V7 of 17 
centipede taxa (see table 2 for the taxono- 
my). The fragment as a whole can be con- 
sidered homologous because it is located be- 
tween two conserved regions that constitute 
a stem. The most common sequence length 
for the V7 region in centipedes ranges be- 
tween 65 and 70 bp (character state present 
in the five recognized orders of centipedes). 
However three taxa (Scolopendra, Ethmos- 
tigmus, and Alipes; family Scolopendridae) 
display sequences between 93 and 94 bp, and 
four taxa (Pseudohimantarium, Henia, Cli- 
nopodes, and Zelanion; belonging to four 
families of Geophilomorpha) exhibit se- 
quences between 354 and 384 bp. These two 
clades are well defined morphologically and 
could also be characterized in terms of se- 
quence length or perhaps secondary structure 
topology. However, a standard phylogenetic 
analysis of this fragment cannot be conduct- 
ed successfully when base-to-base homology 
is required, as is the case with multiple se- 
quence alignments. This situation is frustrat- 
ing since the sequence data clearly display 
historical information that cannot be used 
phylogenetically. 

A new method to analyze DNA sequence 
data that does not require base-to-base cor- 
respondences was recently developed by 
Wheeler (1999) and is discussed in greater 
detail there. Briefly, the method, named 
‘fixed character states’’, optimizes DNA se- 
quence data without employing multiple se- 
quence alignments by treating entire homol- 
ogous stretches of sequence data as charac- 
ters. The set of specific sequences exhibited 
by the terminal taxa constitutes the character 
states. Thus the number of states is equal to 
the number of unique sequences (or homol- 
ogous fragments) exhibited by the data. In 
the example illustrated here, there is one 
character (region V7) with 17 states (as many 
as different taxa). Other situations could arise 
where the number of states would be smaller 
than the number of taxa if two or more were 
to share identical sequences. The salient fea- 
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Scutigera 
1 ACGATCGATT 


Thereuocpoda 
1 ACGATCGATT 


Lithobius 
1 ACGACTGATC 


Australobius 
1 ACGACCGATC 


Craterostigmus 


i ACGACCGATC 


Scolopendra 
1 ACGTCCGATC 


Ethmostigmus 


1 ACGTCCGATC 


Alipes 


1 ACATTCGATC 


Cryptops 


1 ACGTCCGATC 


Theatops 


1 ACGTCCGATC 


TAGGCGAGCT 


TGGGCGAGCT 


CCGGGSTGCC 


CCGGGGTGCC 


CCGGGGTGCC 


CTGGGGTGCC 


TCGGGGTGCC 


TCGGGGTGCT 


TCGAGGTGCC 


TCGGGGTGCC 


Scolopocryptops 
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GTTTCCTTCC 


GTTTCCTGCC 


GoccecceTrcrTr 


GoTacccTcT 


GTCTCCTTCcc 


GSTGccrcc?t 


GGTGCCTCCcT 


GGCACCTCCT 


GTTACCTTEC 


GTCTCCTICT 


GTTGCCATCT 


CCCGNGGNTG 


TTCACGGTAG 


CGNGGGGGAA 


TCGGGGGGAA 


TCGTGATGGA 


AAACCTCCGC 


AAACCCCCCC 


acaccccccc 


TCCTCGTGAG 


CCTCGMGAGA 


CCTCGTGAGG 


GAGCGGCACT 


GAGCGGCACT 


CGGTGTTSCC 


CGGTGTTGCC 


GCGGCGTTAC 


TCTTTCGAAA 


TTCTTCGATG 


TTCTTTCATG 


GGGTtCGGcT 


GGTGCGGCTT 


GGCTCGGCCT 


GCCTCCGTCG 


GCcTCcesT TG 


TCCGTCAGTT 


TCTGTCGGTT 


eTCCGTCGsc 


AGAGTGGGAG 


GAGCGGGGGG 


GAGTGGGGGG 


TTSCCTCTGT 


TEcCTCcCTGTc 


GACCTCTGTT 
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GTCGACAA 


GTCGATAA 


GTTCG 


GATCG 


CGATCG 


GCGGGGGAAC GGCTTTGCCT CTGTCGGACG ATTG 


CGGGGGAACG GCTTTGCCTC TGTCGGATGA TCG 


CGGGGGAACA GCTTTGCCTC TGTCGGATGA TCG 


CCGACGATTA 


GGACGATTG 


GGACGATTG 


1 ACGTCCGATC TCGGGGTGCC 


Mecistocephalus 


1 ACGACTGGTC TIGGGGTGCT 


Nodocephalus 


1 ACGACTGATC TCGGGGTGCT 


Pseudohimantarium 
1 ACGACTGATC CCGGGGTGCC 


101 
201 
301 


Henla 
1 
101 
201 
301 


Clinopodes 
VV 


101 
201 
301 


Zelanion 


br 


101 
201 
301 


YTACTCATGT 
ATCGGTCATG 
GCGGTCTTGC 


ACGACCGATC 
TCAACTCGGC 
GGCGGTCGGC 
TGCAGATAGC 


ACGACCGATC 
GGGTTTTTTC 
GTCTTITTCT 
ACGAATCGTC 


ACGACCGATC 
GGGGAGGCGG 
CCGTTTCACT 
ACGGTTGTCA 


TTTCTATCAC 
TGGTTGCcTc 
GGGACATTAA 


CCGGGGTGCC 
GCGGCAGAGG 
CTTCGTCGTC 
CGAAAGCGGG 


CCGGGGTGCC 
TWTCNCCYGC 
ceeTeseTer 
SACTGCGTTG 


CCGGGGTGCC 
TGOCGTGCGTT 
CTCGGGCGTA 
TTTGCGCGAG 


GGTTCTATTC 


GGATCTATTC 


GGTGTCCccc 
CTCCCCACTC 
GTTGCTTTCG 
TGGGCGGTCA 


GGTGCTCCCA 
CCACGCTCTC 
ATCGTTCTGG 
TTCGGGCGGC 


GGCECTCCCA 
CGGGCGGGAR 
AAGCTCTTTC 
GTCGAGACGG 


GGCGCCTTCT 
TCACTCGCGG 
CGCGCGCATT 
TCGGCTGTCC 


CTTCATGGGT 


CTTCGTGGAT 


CTTCTGTCGC 
ATCGAGTGTT 
CTCTGTGTTG 
GGCTAGGTGG 


TCTTGCTTCT 
GACTTCCCCT 
CCGGGGGTGC 
AGTTGTCGTT 


TCTWGMTTCT 
ABAAGAARGGT 
CCYTTTCTTT 
CAGTTGTCGT 


CGTCTTGCGG 
TAGATGTCGC 
CGTTTYTCCC 
GGCGTTGCCT 


AGCCAGCTTT 


AGCCGGTAGT 


TITAATTITTT 
TTIGCGGCTGG 
TGGTGTGTGT 
GGTCGCACAC 


GTTTGTCCGT 
CGTGTTTCGG 
ATGTTCGCGG 
AGCATGGGAA 


GTTTGTTTICT 
CTGATGCGTG 
CTTTCCGTTT 
TAGCATGGGT 


CcoeTTTcectT 
GGAGTTGGCG 
TTCTTCGGGG 
CTGTCGGTCG 


TGCCTCCGTC 


TGCCTCTGTT 


TGOTCTGCGGC 
TITCTGCCTC 
GTGTGTGGGG 
GGCGTTGCYT 


TTTTCGCTCA 
CGGTTGCGTC 
GCGGGGATGA 
AGCACCCGGC 


STCCGGCGYT 
GGGTGTGTGT 
ACGGAGGTGA 
CAGCGTGTGG 


ccceevtTerTcas 
GAATAAGGCT 
ACTGGGGCTC 
ATCG 


GGTCGATTT 


CGTCGTCCGA 


ATGTTGCCGT 
TGGTCGGTAT 
CGTGTTGAGG 
CTGTCAGTCG 


GGCGACTCTC 
GAGGTTGTCT 
AACCGCTTCG 
AGTTGCCTCC 


GAGACGCTCT 
GSTGAAGCTGT 
TRGATATTTT 
CGTTGCCTCT 


GTTTCGCGTC 
eeeocteTcc 
GAGGGATCGA 


TTGCTTTCTT 
TGCATTTACG 
GCAAAGGCAT 
ACAGG 


GCGTCTCGce 
CCOGTETCTCE 
AGCGCOGAGC 
GTCGGTCGAA 


TIGTGGCTGC 
CTTTYTCTCT 
CTCSGCTCre 
GTCGGTCGAW 


GTTCTCTITT 
CTTTCGGGGG 
GGCGCGCGGG 


GGGTGTATCT 
CCRTCGCGGG 
TATGATTCTC 


TCTCTCTGCG 
CTTTCTCGGA 
GTCGGGAGTC 
TACG 


TOCGATTTTC 
CTTTCGAGGG 
GAGCGGGTCT 
TACG 


CCCTTCCTCA 
TTGAGTGGGG 
ACTTTGCTGA 


Fig. 4. Variable region (V7) of the 18S rRNA locus of 17 species of centipedes. 


ture of this method is the treatment of length 
variation. Since the sequence variation is ex- 
pressed through a series of transformations 
between states, indel or “‘gap”’ variation only 
occurs as transformations between sequenc- 
es, not globally among all the sequence data. 
This has the effect of moderating the diffi- 
culties presented by extreme nucleotide 
length variation at the expense of treating 


TGCTGATCCC 
TATCGTGCGT 
GAGAGGAGTA 


GACGTTCGGT 
AGGGCTGCGC 
CGGGCGTTTC 


TTGCTCCTCC 
GGGAGAGGCT 
TTCAAGGCGC 


CATCCGTCGC 
TCCCGTCGCT 
GGGACCGGCG 


strings of bases as character states instead of 
individual nucleotides. 

A matrix of transformation costs is created 
to relate the states to one another. The cells 
of this matrix are defined as the minimum 
transformation cost required between each 
pair of states based on insertion-deletion and 
base substitution costs (as in the calculation 
of an alignment score). The next operation 


2001 


TABLE 2 
Taxonomy of the Centipede Species 
(Chilopoda) Represented in Figures 4 and 5 





Order Scutigeromorpha 
F. Scutigeridae Scutigera coleoptrata 


Thereuopoda clunifera 


Order Lithobiomorpha 
F. Lithobiidae Lithobius variegatus 


Australobius scabrior 


Order Craterostigmomorpha 
F. Craterostigmidae Craterostigmus tasmanianus 


Order Scolopendromorpha 
F. Scolopendridae Scolopendra cingulata 

Ethmostigmus rubripes 

Alipes crotalus 

Cryptops trisulcatus 

Theatops erythrocephala 


Scolopocryptops nigridus 


F. Cryptopidae 


Order Geophilomorpha 
F. Mecistocephalidae Mecistocephalus sp. 
Nodocephalus doii 
Pseudohimantarium 
mediterraneum 
Henia (Chaetechelyne) 
vesuviana 


F. Himantariidae 
F. Dignathodontidae 


F. Geophilidae 
F. Chilenophilidae 


Clinopodes cfr. poseidonis 
Zelanion antipodus 


uses this transformation matrix to diagnose a 
specific phylogenetic topology by means of 
existing dynamic programming techniques 
(Sankoff and Rousseau, 1975) with the num- 
ber of states greatly expanded. This method 
has been implemented in the computer pro- 
gram POY (Gladstein and Wheeler, 1997) 
specifying the option -fixedstates (available 
via anonymous ftp at the site ftp.amnh.org / 
pub/molecular/poy/). 

To illustrate how the method works em- 
pirically, we have analyzed the 17 sequences 
presented in figure 4 (and table 2) using the 
fixed character states method. The tree ob- 
tained (fig. 5) shows some lack of resolution, 
but also shows certain clades highly consis- 
tent with the current morphology of the 
group, as well as with the molecular analyses 
of Giribet et al. (1999) and Edgecombe et al. 
(1999). Scolopendromorpha is recognized as 
a clade that in turn includes a monophyletic 
clade, the family Scolopendridae, presenting 
an insertion of about 25 bp. Another clade is 
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Scutigera 
Thereuopoda 
Craterostigmus 
Australobius 
Lithobius 
Mecistocephalus 
Nodocephalus 
Zelanion 
Pseudohimantarium 
Henia 
Clinopodes 
Cryptops 
Theatops 
Scolopocryptops 
Scolopendra 
Ethmostigmus 
Alipes 


300 bp 








25 bp 


Fig. 5. Phylogenetic analysis of the data from 
fig. 4 using the “‘fixed character states’? method 
of Wheeler (1999) implemented in the computer 
program POY (Gladstein and Wheeler, 1997). 
Commands: poy -fixedstates -noleading -noran- 
domizeoutgroup -gap | -maxtrees 20 -multibuild 
10 -seed—1 -slop 2 -checkslop 5. The two circles 
illustrate the insertions of the Geophilomorpha 
(ca. 300 bp), and the Scolopendridae (ca. 25 bp). 


defined by the insertion of about 300 bp that 
groups all geophilomorph species except the 
mecistocephalids (Mecistocephalus and No- 
docephalus, the most basal group that lacks 
the insertion). This is encouraging consider- 
ing that this topology corresponds to the 
analysis of just a few bases from the variable 
region V7. Thus, this method facilitates use 
of all the information (variable and con- 
served) from ribosomal genes. 


CONCLUSIONS 


Long SSU rRNA appears to be more com- 
mon than claimed by some authors (e.g., 
Crease and Colbourne, 1998) based on its oc- 
currence in at least seven metazoan phyla: 
Platyhelminthes, Mollusca, Onychophora, 
Arthropoda, Chaetognatha, Echinodermata, 
and Mesozoa. However, large deletions ap- 
pear to be rare, having so far only been found 
in one group of arthropods (Symphyla) and 
in mesozoans. Probably many more taxa dis- 


AZ AMERICAN MUSEUM NOVITATES 


play extraordinary 18S rRNA genes, but this 
will not be discovered until sampling within 
each phylum increases. Nonetheless, the ex- 
istence of variable regions should not dis- 
courage the use of ribosomal genes in phy- 
logenetic analyses, especially when second- 
ary structure predictions are combined with 
novel methods of DNA sequence data anal- 
ysis. In this sense, the characterization of 
secondary structural features by means of the 
comparative method, and the use of these 
features (homologous regions) as characters 
with multiple states provides a powerful ap- 
proach for the analysis of such data using the 
fixed character states method. Maybe it is at 
such levels that secondary structure infor- 
mation can best contribute to phylogenetic 
analyses. 
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